Goto

Collaborating Authors

 nonzero component


Storage capacity of perceptron with variable selection

arXiv.org Machine Learning

A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load $ฮฑ$ and the variable selection ratio $ฯ$ for which a simple perceptron can perfectly classify $P = ฮฑN$ random patterns by optimally selecting $M = ฯN$ variables out of $N$ variables. While the Cover--Gardner theory establishes that a random subset of $ฯN$ dimensions can separate $ฮฑN$ random patterns if and only if $ฮฑ< 2ฯ$, we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.


Reviews: Sparse PCA from Sparse Linear Regression

Neural Information Processing Systems

The paper proposes an approach to reduce solving a special sparse PCA to a sparse linear regression (SLR) problem (treated as a black-box solution). It uses the spiked covariance model [17] and assumes that the number of nonzero components of the direction (u) is known, plus some technical conditions such as a restricted eigenvalue property. The authors propose algorithms for both hypothesis testing and support recovery, as well as provide theoretical performance guarantees for them. Finally, the paper argues that the approach is robust to rescaling and presents some numerical experiments comparing two variants of the method (based on SLR methods FoBa and LASSO) with two alternatives (diagonal thresholding and covariance thresholding). Strengths: - The addressed problem (sparse PCA) is interesting and important.


Reduced-Space Iteratively Reweighted Second-Order Methods for Nonconvex Sparse Regularization

arXiv.org Artificial Intelligence

This paper explores a specific type of nonconvex sparsity-promoting regularization problems, namely those involving $\ell_p$-norm regularization, in conjunction with a twice continuously differentiable loss function. We propose a novel second-order algorithm designed to effectively address this class of challenging nonconvex and nonsmooth problems, showcasing several innovative features: (i) The use of an alternating strategy to solve a reweighted $\ell_1$ regularized subproblem and the subspace approximate Newton step. (ii) The reweighted $\ell_1$ regularized subproblem relies on a convex approximation to the nonconvex regularization term, enabling a closed-form solution characterized by the soft-thresholding operator. This feature allows our method to be applied to various nonconvex regularization problems. (iii) Our algorithm ensures that the iterates maintain their sign values and that nonzero components are kept away from 0 for a sufficient number of iterations, eventually transitioning to a perturbed Newton method. (iv) We provide theoretical guarantees of global convergence, local superlinear convergence in the presence of the Kurdyka-\L ojasiewicz (KL) property, and local quadratic convergence when employing the exact Newton step in our algorithm. We also showcase the effectiveness of our approach through experiments on a diverse set of model prediction problems.


The Challenge of Differentially Private Screening Rules

arXiv.org Artificial Intelligence

Linear $L_1$-regularized models have remained one of the simplest and most effective tools in data analysis, especially in information retrieval problems where n-grams over text with TF-IDF or Okapi feature values are a strong and easy baseline. Over the past decade, screening rules have risen in popularity as a way to reduce the runtime for producing the sparse regression weights of $L_1$ models. However, despite the increasing need of privacy-preserving models in information retrieval, to the best of our knoweledge, no differentially private screening rule exists. In this paper, we develop the first differentially private screening rule for linear and logistic regression. In doing so, we discover difficulties in the task of making a useful private screening rule due to the amount of noise added to ensure privacy. We provide theoretical arguments and experimental evidence that this difficulty arises from the screening step itself and not the private optimizer. Based on our results, we highlight that developing an effective private $L_1$ screening method is an open problem in the differential privacy literature.


Asymptotic Properties for Bayesian Neural Network in Besov Space

arXiv.org Artificial Intelligence

Neural networks have shown great predictive power when dealing with various unstructured data such as images and natural languages. The Bayesian neural network captures the uncertainty of prediction by putting a prior distribution for the parameter of the model and computing the posterior distribution. In this paper, we show that the Bayesian neural network using spike-and-slab prior has consistency with nearly minimax convergence rate when the true regression function is in the Besov space. Even when the smoothness of the regression function is unknown the same posterior convergence rate holds and thus the spike-and-slab prior is adaptive to the smoothness of the regression function. We also consider the shrinkage prior, which is more feasible than other priors, and show that it has the same convergence rate. In other words, we propose a practical Bayesian neural network with guaranteed asymptotic properties.


Does SLOPE outperform bridge regression?

arXiv.org Machine Learning

A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize the estimation error of SLOPE under the complementary regime where both $k$ and $n$ scale linearly with $p$, and provide new insights into the performance of SLOPE estimators. We first derive a concentration inequality for the finite sample mean square error (MSE) of SLOPE. The quantity that MSE concentrates around takes a complicated and implicit form. With delicate analysis of the quantity, we prove that among all SLOPE estimators, LASSO is optimal for estimating $k$-sparse parameter vectors that do not have tied non-zero components in the low noise scenario. On the other hand, in the large noise scenario, the family of SLOPE estimators are sub-optimal compared with bridge regression such as the Ridge estimator.


Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

Neural Information Processing Systems

We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times. It is shown that the methods achieve essentially a sample complexity of $O(1/\varepsilon)$ to attain an error of $\varepsilon$ under a variant of restricted eigenvalue condition, and the rate has better dependency on the problem dimension than existing methods. Particularly, if the smallest magnitude of the non-zero components of the optimal solution is not too small, the rate of our proposed {\it Hybrid} algorithm can be boosted to near the minimax optimal sample complexity of {\it full information} algorithms. The core ideas are (i) efficient construction of an unbiased gradient estimator by the iterative usage of the hard thresholding operator for configuring an exploration algorithm; and (ii) an adaptive combination of the exploration and an exploitation algorithms for quickly identifying the support of the optimum and efficiently searching the optimal parameter in its support. Experimental results are presented to validate our theoretical findings and the superiority of our proposed methods.


Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

arXiv.org Machine Learning

We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times. It is shown that the methods achieve essentially a sample complexity of $O(1/\varepsilon)$ to attain an error of $\varepsilon$ under a variant of restricted eigenvalue condition, and the rate has better dependency on the problem dimension than existing methods. Particularly, if the smallest magnitude of the non-zero components of the optimal solution is not too small, the rate of our proposed {\it Hybrid} algorithm can be boosted to near the minimax optimal sample complexity of {\it full information} algorithms. The core ideas are (i) efficient construction of an unbiased gradient estimator by the iterative usage of the hard thresholding operator for configuring an exploration algorithm; and (ii) an adaptive combination of the exploration and an exploitation algorithms for quickly identifying the support of the optimum and efficiently searching the optimal parameter in its support. Experimental results are presented to validate our theoretical findings and the superiority of our proposed methods.


Privacy Preserving Identification Using Sparse Approximation with Ambiguization

arXiv.org Machine Learning

A. Identification and ANN Search Many modern applications such as biometrics, digital physical object security and data generated by connected objects in the IoT require privacy preserving identification of a query with respect to a given dataset. Practically, the identification problem is based on an ANN search when a list of indices corresponding to the NN items is returned. At the final refinement stage, the list can be refined in a private setting and a single index is declared as the identified one. The identification problem faces the curse of dimensionality. For this reason, the exact identification is replaced by a search of list of closest items, i.e., one tries to tradeoff the accuracy of identification by the search complexity. In recent years, many methods providing efficient ANN solutions for multi-billion entry datasets were proposed and we named some of them without pretending to be exhaustive in our overview [1]-[3]. B. Search in Privacy Preserving Settings: Main Considerations Due to the massive amount of data, modern distributed storage and computing facilities, many ANN problems are considered in a setting where the data user outsources his datasets by applying the corresponding protection measures to third parties (servers) possessing powerful storage, communications and computing facilities. The need for data protection comes from many perspectives related to the cost of data collection, data as a "product" that represents a great value in the era of machine learning, which can be used to train and prune new and existing machine learning tools. Moreover, the server might want to discover some hidden relationships in the data.


Sparse recovery via Orthogonal Least-Squares under presence of Noise

arXiv.org Machine Learning

We consider the Orthogonal Least-Squares (OLS) algorithm for the recovery of a $m$-dimensional $k$-sparse signal from a low number of noisy linear measurements. The Exact Recovery Condition (ERC) in bounded noisy scenario is established for OLS under certain condition on nonzero elements of the signal. The new result also improves the existing guarantees for Orthogonal Matching Pursuit (OMP) algorithm. In addition, This framework is employed to provide probabilistic guarantees for the case that the coefficient matrix is drawn at random according to Gaussian or Bernoulli distribution where we exploit some concentration properties. It is shown that under certain conditions, OLS recovers the true support in $k$ iterations with high probability. This in turn demonstrates that ${\cal O}\left(k\log m\right)$ measurements is sufficient for exact recovery of sparse signals via OLS.